Effects of stand age, moisture class, and fire severity on legacy C retention in boreal forest soils post burning event

A replication of data analysis presented in Walker et al (2019) ‘Increasing wildfires threaten historic carbon sink of boreal forest soils’.

Required Packages:

library(curl)
library(ggplot2)
library(lmtest)
library(effects)
library(broom)

Required Data:

s <- curl('https://raw.githubusercontent.com/vfrench-397/AN588_ReplicationAnalysis/main/Data/Radiocarbon_site_data.csv')
site.data <- read.csv(file = s, header = TRUE, stringsAsFactors = TRUE) #general site information
d <- curl('https://raw.githubusercontent.com/vfrench-397/AN588_ReplicationAnalysis/main/Data/Soil_radiocarbon_data.csv')
soil.data <- read.csv(file = d, header = TRUE, stringsAsFactors = TRUE) #carbon-14 quantities 

Background

Boreal regions have extremely thick soil organic layers due to their relatively slow rates of decomposition. Meaning they can be extremely effective in storing and retaining large amounts of carbon, preventing C from being emitted to the atmosphere and contributing to the formation of greenhouse gases.

When wildfires sweep through boreal regions, they combust the exposed soil organic layer, releasing stored C to the atmosphere. Due to the thick nature of these soil organic layers (SOLs), combustion is restricted to a proportion of the entire SOL. The portion that is unaffected by the most recent burning event is expected to retain ‘legacy carbon’.

With increasing frequency and severity of wildfire events as a result of global warming, these legacy carbon pools are under threat.

This analysis uses relative radiocarbon dating of the carbon-14 isotope to assess the proportion of sites across the Northwest Territory, Canada that possessed and retained their legacy C pools after burning events in 2014. Legacy C was identified (through radiocarbon dating) as any organic soil C that is older than the stand age (time since previous burning event).

Experimental Design

In 32 plots \(\Delta^{14}C\) (parts per thousand difference between sample carbon-14:carbon-12 and the international standard ratio) was measured at several soil depth increments. The 32 plots were assigned a moisture class (based on their topography, drainage, presence of permafrost and their soil class) and a stand age class (young < 60 years since previous burning event, old > 70 years, based on the historic fire return interval of North American northwest coast boreal ecosystems).

Soil samples were dated (based on \(\Delta^{14}C\) values) relative to a known historic peak C level from nuclear bomb testing in the area (further known as bomb peak).

Site establishment was dated based on tree rings.

Hypotheses

The most significant predictors of legacy C presence and retention will be soil moisture class and stand age as these are reported to be the most influential factors contributing to SOL combustion and accumulation.

  1. Legacy C will be present more frequently in wet ecosystems (high soil organic matter (SOM) C accumulation due to low decomposition rates and low burn severity). Legacy C will be present less frequently (if at all) in dry ecosystems (low SOM accumulation and higher severity burns)
  2. Legacy C is more likely to combust in historically moist (moderate) ecosystems due to their larger legacy C pools (ability to harbor legacy C previously) and their higher burn risk under drought and increasing wildfire severity.
  3. Legacy C is more likely to combust in young stands because the natural protective layer of SOM has not yet re-accumulated.

Presence of Legacy C

To identify plots that possess legacy C, the \(\Delta^{14}C\) values of the basal organic soil layer were compared to \(\Delta^{14}CO_2\) values of the atmosphere during the year of stand establishment.

Legacy C was considered present if the base of the SOL was older than the year of stand establishment (meaning \(\Delta^{14}C\) lower (having decayed for longer) than \(\Delta^{14}CO_2\)).

Therefore the first step is to extract basal SOL C values from the larger data set.

basal.layer <- soil.data[soil.data$increment_location == 'BOTTOM', ]
basal.layer <- basal.layer[duplicated(basal.layer$burned_site_plot) == FALSE,] 

Next we determine and label the presence/absence of legacy carbon for each plot.

basal.layer$LC <- NULL
for (i in 1:nrow(basal.layer)) {
  if (basal.layer$Delta14C[i] < basal.layer$X14C_yr_stand_age[i]) {
    basal.layer$LC[i] <- 'Present'
  }
  else {
    basal.layer$LC[i] <- 'Absent'
  }
}
sum(basal.layer$LC == 'Present') #number of sites determined to possess legacy C 
## [1] 19

Data Visualization

For visualization, we must manipulate the data, combine the carbon measurements of the basal soil layer and the atmosphere at stand establishment into a single variable, for the ggplot object.

basal.carbon <- basal.layer[,c('burned_site_plot', 'soil_depth_increment', 'Delta14C', 'soil_class_pre.post_bomb', 'stand_age_pre.post_bomb', 'LC')]
basal.carbon$type <- 'Soil.Base'
names(basal.carbon) <- c('Plot', 'Depth', 'Delta.14.C', 'Soil.Class','Stand.Class','LC', 'Location')
basal.stand.age <- basal.layer[,c('burned_site_plot', 'soil_depth_increment', 'X14C_yr_stand_age', 'soil_class_pre.post_bomb', 'stand_age_pre.post_bomb', 'LC')]
basal.stand.age$type <- 'Stand.Age'
names(basal.stand.age) <- c('Plot', 'Depth', 'Delta.14.C', 'Soil.Class', 'Stand.Class','LC', 'Location')
legacy.carbon <- rbind(basal.carbon, basal.stand.age)

And divide the legacy carbon data set into pre and post bomb peak sites.

legacy.carbon.pre <- legacy.carbon[legacy.carbon$Stand.Class == 'pre',]
legacy.carbon.post <- legacy.carbon[legacy.carbon$Stand.Class == 'post',]

Then we can plot basal layer \(\Delta^{14}C\) against atmospheric \(\Delta^{14}CO_2\) at time of stand establishment (pre and post bomb peak) (Figure 2 from original paper)

ggplot(legacy.carbon.pre, aes(x= Location, y = Delta.14.C)) + geom_point(aes(shape = Location, color = Location), size = 3) + geom_line(aes(group = Plot, linetype = LC), color = 'black') + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = NULL, y = 'Delta 14 Carbon (parts per thousand)', linetype = 'Legacy Carbon')

Caption: Basal \(\Delta^{14}C\) (green circles) compared to atmospheric concentrations of \(^{14}C\) at the time of stand establishment (red triangles) in sites dated pre-bomb peak (pre 1966). Dashed lines represent sites where Legacy C is present. Solid lines represent sites where Legacy C was absent.

ggplot(legacy.carbon.post, aes(x= Location, y = Delta.14.C)) + geom_point(aes(shape = Location, color = Location), size = 3) + geom_line(aes(group = Plot, linetype = LC), color = 'black') + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = NULL, y = 'Delta 14 Carbon (parts per thousand)', linetype = 'Legacy Carbon') + scale_linetype_manual(values = 'dashed')

Caption: Basal \(\Delta^{14}C\) (green circles) compared to atmospheric concentrations of \(^{14}C\) at the time of stand establishment (red triangles) in sites dated post-bomb peak (post 1966). Dashed lines represent sites where Legacy C is present. Solid lines represent sites where Legacy C was absent.

We can also visualize the relationship between C and our expected predictor variables.

log.r.data <- cbind(basal.layer, site.data)
log.r.data <- log.r.data[,-c(21,22)]
ggplot(data = log.r.data, aes(x = LC, y = stand_age_rings)) + geom_boxplot(aes(color = LC)) + theme_bw() + labs(title = 'Legacy C v. Stand Age', x = 'Legacy Carbon', y = 'Stand Age (yr)') + scale_color_manual(values = c('darkgreen', 'darkred')) + theme(legend.position="none") + geom_hline(aes(yintercept = 60), linetype = 'dashed')

Caption: Legacy C class (presence or absence) plotted against the sites stand age (time since stand establishment, or time since previous burning event). The dashed horizontal line at 60 years represents the cut off point for age class (young or old). The boxplot displays the median. The two hinges represent the first and third quartiles (25th and 75th percentiles). The lower and upper whiskers extend from the smallest and largest values respectively to no further than 1.5 * IQR (inter-quartile range). Legacy C presence (red) was more likely in younger stands where Legacy C was absent (green) in only older stands.

We can see from comparing the basal layer C levels of old and young plots that they have similar \(\Delta^{14}C\) levels which indicates they both underwent similar cycles of accumulation/combustion in previous burn cycles. Therefore the large proportion of young stands harboring legacy C could be due to the young soils not yet re-accumulating their C stocks.

ggplot(data = log.r.data, aes(x = stand_age_class, y = Delta14C)) + geom_boxplot(aes(color = stand_age_class)) + theme_bw() + labs(title = 'Stand Age Class v. Delta 14 C', x = 'Stand Age', y = 'Delta 14 C (parts per thousand)') + scale_color_manual(values = c('darkgreen', 'darkred')) + theme(legend.position="none")

Caption: Stand age class (old = green and young = red) plotted against basal \(\Delta^{14}C\) levels to establish historic burn cycle effects on C combustion and accumulation. The boxplot displays the median. The two hinges represent the first and third quartiles (25th and 75th percentiles). The lower and upper whiskers extend from the smallest and largest values respectively to no further than 1.5 * IQR (inter-quartile range).

ggplot(data = log.r.data, aes(x = LC, y = prefire_sol_depth)) + geom_boxplot(aes(color = LC)) + theme_bw() + labs(title = 'Legacy Carbon v. Prefire SOL Depth ', x = 'Legacy Carbon', y = 'Prefire SOL depth (cm)') + scale_color_manual(values = c('darkgreen', 'darkred')) + theme(legend.position="none")

Caption: Legacy C class (presence = red and absence = green) plotted against prefire SOL depth (cm). The boxplot displays the median. The two hinges represent the first and third quartiles (25th and 75th percentiles). The lower and upper whiskers extend from the smallest and largest values respectively to no further than 1.5 * IQR (inter-quartile range).

You can see Legacy C is present in all plots with prefire SOL depth > 30 cm which is indicative of wetter ecosystems. Using prefire SOL depth as a proxy for moisture class is appropriate, as we can see clear trends across prefire SOL depth when plotted against mositure class.

ggplot(data = log.r.data, aes(x = moisture_class, y = prefire_sol_depth)) + geom_boxplot(aes(color = LC)) + theme_bw() + labs(title = 'Moisture Class v. Prefire SOL depth', x = 'Moisture Class', y = 'Prefire SOL depth (cm)') + scale_color_manual(values = c('darkgreen', 'darkred', 'darkblue')) 

Caption: Moisture class (dry, moist and wet) plotted against the prefire SOL depth (cm). Each moisture class subset by the presence (red) or absence (green) of Legacy C. The boxplot displays the median. The two hinges represent the first and third quartiles (25th and 75th percentiles). The lower and upper whiskers extend from the smallest and largest values respectively to no further than 1.5 * IQR (inter-quartile range). The prefire SOL depths are distinct for each moisture class indicating prefire SOL depth is an appropriate proxy for moisture class.

Multiple Logistic Regression

We will generate a generalized linear model for this data because we cannot assume a normal distribution of our response variable or its residuals and therefore cannot fit the assumptions of a general linear model. We use a logistic regression, a form of a generalized linear model, because our response variable is binary (presence or absence of legacy C) so we must use the logit link function to calculate the natural log of the odds ratio between our two possible outcomes and use that output as our response variable.

When preparing the data to be placed into the glm() function we need to make sure our variables are factored to avoid fitting errors when running the regression.

log.r.data$LC <- as.factor(log.r.data$LC)
levels(log.r.data$LC)
## [1] "Absent"  "Present"

Model Selection

To evaluate the model significance we can compare model fit between a full, a reduced and a null model. In our case the full model will include the interaction terms between the predictor variables stand age and SOL depth. The reduced model will remove only the interaction term between the two predictor variables and the null model will be an intercept only model.

full.presence <- glm(data = log.r.data, formula = LC ~ prefire_sol_depth*stand_age_rings, family = binomial)
summary(full.presence)
## 
## Call:
## glm(formula = LC ~ prefire_sol_depth * stand_age_rings, family = binomial, 
##     data = log.r.data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6056  -0.9864   0.5256   0.7918   1.8665  
## 
## Coefficients:
##                                    Estimate Std. Error z value Pr(>|z|)  
## (Intercept)                        2.714911   2.336077   1.162   0.2452  
## prefire_sol_depth                 -0.057961   0.109514  -0.529   0.5966  
## stand_age_rings                   -0.042048   0.025303  -1.662   0.0965 .
## prefire_sol_depth:stand_age_rings  0.001444   0.001190   1.214   0.2249  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 43.230  on 31  degrees of freedom
## Residual deviance: 34.501  on 28  degrees of freedom
## AIC: 42.501
## 
## Number of Fisher Scoring iterations: 5
reduced.presence <- glm(data = log.r.data, formula = LC ~ prefire_sol_depth + stand_age_rings, family = binomial)
summary(reduced.presence)
## 
## Call:
## glm(formula = LC ~ prefire_sol_depth + stand_age_rings, family = binomial, 
##     data = log.r.data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6482  -1.0558   0.4442   0.8946   1.7336  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)  
## (Intercept)        0.169088   1.172339   0.144   0.8853  
## prefire_sol_depth  0.079485   0.044768   1.775   0.0758 .
## stand_age_rings   -0.014321   0.008787  -1.630   0.1031  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 43.230  on 31  degrees of freedom
## Residual deviance: 36.289  on 29  degrees of freedom
## AIC: 42.289
## 
## Number of Fisher Scoring iterations: 4
null.presence <- glm(data = log.r.data, formula = LC ~ 1, family = binomial)
summary(null.presence)
## 
## Call:
## glm(formula = LC ~ 1, family = binomial, data = log.r.data)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.342  -1.342   1.021   1.021   1.021  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   0.3795     0.3599   1.054    0.292
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 43.23  on 31  degrees of freedom
## Residual deviance: 43.23  on 31  degrees of freedom
## AIC: 45.23
## 
## Number of Fisher Scoring iterations: 4

We can compare the models using a log likelihood test which utilizes a ratio of the log-likelihoods as the test statistic and utilizes the Chi-squared distribution to calculate the p-values. Therefore we can run a log-likelihood test by arguing the two models to the anova() function and additionally including test = 'Chisq' argument.

anova(reduced.presence, full.presence, test = 'Chisq') 
## Analysis of Deviance Table
## 
## Model 1: LC ~ prefire_sol_depth + stand_age_rings
## Model 2: LC ~ prefire_sol_depth * stand_age_rings
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1        29     36.289                     
## 2        28     34.501  1   1.7878   0.1812

Here the chi-squared p value is not significant so the inclusion of the interaction term does not significantly decrease deviance and we can resume with the reduced model.

We can also run a log -likelihood test using the lrtest() from the {lmtest} package

lrtest(reduced.presence, full.presence) 
## Likelihood ratio test
## 
## Model 1: LC ~ prefire_sol_depth + stand_age_rings
## Model 2: LC ~ prefire_sol_depth * stand_age_rings
##   #Df  LogLik Df  Chisq Pr(>Chisq)
## 1   3 -18.145                     
## 2   4 -17.251  1 1.7878     0.1812

Here the higher log likelihood value indicates a greater model fit, but the chi-squared p value is not significant so even though the full model has a ‘better’ fit it is not significantly better. Therefore, again we can proceed with utilizing the reduced model for our analysis.

Next, we must compare the accepted model, the reduced model, against the null model

anova(null.presence, reduced.presence, test = 'Chisq') 
## Analysis of Deviance Table
## 
## Model 1: LC ~ 1
## Model 2: LC ~ prefire_sol_depth + stand_age_rings
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
## 1        31     43.230                       
## 2        29     36.289  2   6.9407  0.03111 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here the chi-squared p value is less than alpha (.05) meaning the fuller model is associated with less residual deviance and therefore is a better fit! We can reject the null hypothesis that the removal of the predictor variables does not result in a loss of fit and proceed with utilizing the reduced model for our analysis.

Interpretation

Our slope estimate (\(\beta1\)) now represents the associated change in the response variable (which if we remember is the natural log of the odds ratio) with an increase in the predictor variable of 1 unit.

The coefficient/estimate for prefire SOL depth was positive. Therefore increasing prefire SOL depth results in an increased likelihood of legacy C being present. For every one cm of prefire SOL depth, the probability of legacy being present increases by 0.079485. Where the estimate for stand age is negative indicating every year the stand ages the likelihood of legacy C being present decreased by 0.014321. Even though neither of these relationships showed a significant p-value.

In order to get the actual odds ratio we must back transform the estimates using the reverse of the link function.

We can find the reverse of the logit link function from the model

fam <- family(reduced.presence)
rev.logit <- fam$linkinv #eta function
SOL.depth.change <- rev.logit(reduced.presence$coefficients[2])
stand.age.change <- rev.logit(reduced.presence$coefficients[3])

SOL.depth.change
## prefire_sol_depth 
##         0.5198608
stand.age.change
## stand_age_rings 
##       0.4964197

1 unit increase in SOL depth results in a 52 % increase in the likelihood of Legacy C presence 1 unit increase in stand age results in a 49% decrease in the likelihood of Legacy C presence

Hypothesis Testing

To test our null hypothesis that the response variable (Legacy C presence) and our predictor variables have no relationship, we can calculate the Wald statistic (similar to a t-statistic) for each predictor variable and compare them to the z distribution.

Calculating the Wald Statistic for SOL depth

library(broom)
modelresults <- tidy(reduced.presence)
wald <- modelresults$estimate[2]/modelresults$std.error[2]
p <- 2 * (1 - pnorm(wald))  # calculation of 2 tailed p value associated with the Wald statistic
p
## [1] 0.07581763

for stand age

library(broom)
modelresults <- tidy(reduced.presence)
wald <- modelresults$estimate[3]/modelresults$std.error[3]
p <- 2 * (1 - pnorm(wald))  # calculation of 2 tailed p value associated with the Wald statistic
p
## [1] 1.896879

Again neither predictor variable seems to show a significant p-value.

Confidence Intervals

Finally, we can calculate the confidence intervals around our estimates

CI <- confint.default(reduced.presence, level = 0.95) # this function returns CIs based on standard errors instead of based on log-likelihood 
CI
##                          2.5 %      97.5 %
## (Intercept)       -2.128654035 2.466829843
## prefire_sol_depth -0.008258719 0.167229045
## stand_age_rings   -0.031542971 0.002900075

Calculating the CIs for the actual odds ratios

SOL.depth.changeCI <- rev.logit(CI[2, ]) #for prefire SOL depth
stand.age.changeCI <- rev.logit(CI[3, ]) #for stand age 

SOL.depth.changeCI
##     2.5 %    97.5 % 
## 0.4979353 0.5417101
stand.age.changeCI
##     2.5 %    97.5 % 
## 0.4921149 0.5007250

Model Predictions

Before we create the ggplot object we can transform the factored response variable (presence) into a binary value for easier data plotting.

log.r.data$LC.bin <- NULL 
for (i in 1:nrow(log.r.data)) {
  if (log.r.data$LC[i] == 'Present') {
    log.r.data$LC.bin[i] <- 1
  }
  else {
    log.r.data$LC.bin[i] <- 0
  }
}

Although we ran a model with multiple predictor variables, it can be helpful to visualize the predicted likelihood of legacy C presence by each separate predictor variable.

Therefore, we have to generate a new model for each variable.

glm.sol.depth <- glm(data= log.r.data, formula = LC ~ prefire_sol_depth, family = 'binomial')
summary(glm.sol.depth)
## 
## Call:
## glm(formula = LC ~ prefire_sol_depth, family = "binomial", data = log.r.data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5716  -1.1258   0.4911   1.0939   1.3937  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)  
## (Intercept)       -1.09037    0.88393  -1.234   0.2174  
## prefire_sol_depth  0.06993    0.04077   1.715   0.0864 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 43.230  on 31  degrees of freedom
## Residual deviance: 39.342  on 30  degrees of freedom
## AIC: 43.342
## 
## Number of Fisher Scoring iterations: 4
glm.stand.age <- glm(data = log.r.data, formula = LC ~ stand_age_rings, family = 'binomial')
summary(glm.stand.age)
## 
## Call:
## glm(formula = LC ~ stand_age_rings, family = "binomial", data = log.r.data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4450  -1.2695   0.7322   0.9323   1.5312  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)  
## (Intercept)      1.650387   0.916089   1.802   0.0716 .
## stand_age_rings -0.012385   0.008052  -1.538   0.1240  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 43.230  on 31  degrees of freedom
## Residual deviance: 40.635  on 30  degrees of freedom
## AIC: 44.635
## 
## Number of Fisher Scoring iterations: 4

Then we have to generate a new range of values of our predictor variables for which we will predict LC presence.

new.sol.depth <- as.data.frame(seq(min(log.r.data$prefire_sol_depth),max(log.r.data$prefire_sol_depth), length.out = 32))
names(new.sol.depth) <- 'prefire_sol_depth' 
new.stand.age <- as.data.frame(seq(min(log.r.data$stand_age_rings),max(log.r.data$stand_age_rings), length.out = 32))
names(new.stand.age) <- 'stand_age_rings'

Finally, we can predict the LC values for the generated prefire SOL depth values.

prediction.sol.depth <- cbind(new.sol.depth,predict(glm.sol.depth, newdata = new.sol.depth, type = 'link', se = TRUE))

and generate the upper and lower confidence intervals for each prediction (using 1.96 because it is the approx critical value for 95% CI). Since we argued the response type should be in the link scale we must back transform the CIs (and the fit when plotting) to plot the values on the appropriate response scale.

prediction.sol.depth$LL <- rev.logit(prediction.sol.depth$fit - 1.96 * prediction.sol.depth$se.fit)
prediction.sol.depth$UL <- rev.logit(prediction.sol.depth$fit + 1.96 * prediction.sol.depth$se.fit)
head(prediction.sol.depth)
##   prefire_sol_depth        fit    se.fit residual.scale        LL        UL
## 1          6.340000 -0.6470421 0.6602462              1 0.1255244 0.6563432
## 2          7.799355 -0.5449953 0.6126501              1 0.1485791 0.6583160
## 3          9.258710 -0.4429484 0.5673067              1 0.1743841 0.6612731
## 4         10.718065 -0.3409016 0.5248001              1 0.2026991 0.6654565
## 5         12.177419 -0.2388548 0.4858755              1 0.2330496 0.6711655
## 6         13.636774 -0.1368080 0.4514602              1 0.2647022 0.6787545

Plot the predicted probabilities and their 95% CI alongside the actual data (Figure 3 from original paper)

p <- ggplot(data = prediction.sol.depth, aes(x= prefire_sol_depth, y = rev.logit(fit))) + geom_line() + geom_ribbon(data = prediction.sol.depth, aes(ymin = LL, ymax = UL), alpha = .2) + labs(x = "Prefire SOL depth (cm)", y ="Pr(Present)", color = 'Moisture Class', shape = 'Stand Age Class') + geom_point(data = log.r.data, aes(x = prefire_sol_depth, y = LC.bin, color = moisture_class, shape = stand_age_class), size = 3.5) + theme_bw() + scale_color_manual(values = c('darkgreen', 'darkred','darkblue')) + ylim(0,1)
p

Caption: Predicted probability of Legacy C presence for prefire SOL depth. Actual data points are grouped by stand age class and moisture class. The ribbons represent the 95% confidence interval for the predicted values.

Then we must do the same for the stand age predictor.

We can predict the LC values for the generated stand age values.

prediction.stand.age <- cbind(new.stand.age ,predict(glm.stand.age, newdata = new.stand.age, type = 'link', se = TRUE))

and generate the upper and lower confidence intervals for each prediction.

prediction.stand.age$LL <- rev.logit(prediction.stand.age$fit - 1.96 * prediction.stand.age$se.fit)
prediction.stand.age$UL <- rev.logit(prediction.stand.age$fit + 1.96 * prediction.stand.age$se.fit)
head(prediction.stand.age)
##   stand_age_rings      fit    se.fit residual.scale        LL        UL
## 1              19 1.415075 0.7790426              1 0.4720670 0.9498840
## 2              25 1.340766 0.7370696              1 0.4740507 0.9418830
## 3              31 1.266457 0.6959197              1 0.4756329 0.9327963
## 4              37 1.192148 0.6557479              1 0.4767373 0.9225432
## 5              43 1.117839 0.6167454              1 0.4772701 0.9110610
## 6              49 1.043529 0.5791484              1 0.4771156 0.8983168

Plot the predicted probabilities and their 95% CI alongside the actual data (Figure 3 from original paper)

p <- ggplot(data = prediction.stand.age, aes(x= stand_age_rings, y = rev.logit(fit))) + geom_line() + geom_ribbon(data = prediction.stand.age, aes(ymin = LL, ymax = UL), alpha = .2) + labs(x = "Stand Age (yr)", y ="Pr(Present)", color = 'Moisture Class', shape = 'Stand Age Class') + geom_point(data = log.r.data, aes(x = stand_age_rings, y = LC.bin, color = moisture_class, shape = stand_age_class), size = 3.5) + theme_bw() + scale_color_manual(values = c('darkgreen', 'darkred','darkblue')) + ylim(0,1) + geom_vline(aes(xintercept = 60), linetype = 'dashed')
p

Caption: Predicted probability of Legacy C presence for stand age. Actual data points are grouped by stand age class and moisture class. The ribbons represent the 95% confidence interval for the predicted values. The dashed line shows the cutoff for young or old stand age classification.

Soil surface dating relative to bomb peak

Here we will determine the surface soil establishment date relative to bomb peak by comparing \(\Delta^{14}C\) levels at the soil surface to \(\Delta^{14}C\) levels at soil depth increment of 2 cm. In natural states, depth should have less C because it is older (more C has decayed & surface layers accumulate vertically). Therefore, if C levels are higher at the surface than at the 2cm depth then the surface is dated as pre bomb peak. If the soil surface has less C than the 2cm depth, the surface layer is determined to be established post bomb peak. These classifications will be important for when we determine legacy C combustion later in the analysis.

Extract surface layer measurements

surface.layer <- soil.data[soil.data$increment_location == 'TOP',]
surface.layer$soil_depth_increment <- as.factor(surface.layer$soil_depth_increment)

Assigning stand age class (young; < 60 years or old > 70 years) based on the tree ring dates.

surface.layer$Age <- NULL
for (i in 1:nrow(surface.layer)) {
  if (surface.layer$stand_age_rings[i] < 60) {
    surface.layer$Age[i] <- 'Young'
  }
  else {
    surface.layer$Age[i] <- 'Old'
  }
}

Data Visualization

Here we plot the \(\Delta^{14}C\) at both soil increments to determine the relative soil surface dates

ggplot(surface.layer, aes(x = soil_depth_increment, y = Delta14C)) + geom_point(aes(shape = soil_depth_increment, color = soil_depth_increment), size = 3) + geom_line(aes(group = site_sample_transect, linetype = soil_class_pre.post_bomb)) + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = 'Soil Depth', y = 'Delta 14 Carbon (parts per thousand)', color = 'Soil Depth', shape = 'Soil Depth', linetype = 'Soil Surface Est. relative to bomb peak')

We can also graph the pre-bomb and post-bomb sites separately to show the relationship of \(\Delta^{14}C\) between the 2 soil increments more clearly.

Divide into pre and pot bomb peak data.

surface.layer.pre <- surface.layer[surface.layer$soil_class_pre.post_bomb == 'pre', ]
surface.layer.post <- surface.layer[surface.layer$soil_class_pre.post_bomb == 'post', ]

Plot soil surface (soil depth increment of 1) to soil depth increment 2. (pre and post bomb peak) (Figure 2 from original paper)

ggplot(surface.layer.pre, aes(x = soil_depth_increment, y = Delta14C)) + geom_point(aes(shape = soil_depth_increment, color = soil_depth_increment), size = 3) + geom_line(aes(group = site_sample_transect, linetype = Age)) + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = 'Soil Depth', y = 'Delta 14 Carbon (parts per thousand)', color = 'Soil Depth', shape = 'Soil Depth') + scale_x_discrete(labels = c('Surface','2 cm'), breaks = c(1,2))

Caption: Surface \(\Delta^{14}C\) (green circles) compared to 2cm depth \(\Delta^{14}C\) (red triangles) in sites where the soil surface was dated pre-bomb peak (pre 1966). Dashed lines represent young sites. Solid lines represent old sites.

ggplot(surface.layer.post, aes(x = soil_depth_increment, y = Delta14C)) + geom_point(aes(shape = soil_depth_increment, color = soil_depth_increment), size = 3) + geom_line(aes(group = site_sample_transect, linetype = Age)) + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = 'Soil Depth', y = 'Delta 14 Carbon (parts per thousand)', color = 'Soil Depth', shape = 'Soil Depth') + scale_x_discrete(labels = c('Surface','2 cm'), breaks = c(1,2))

Caption: Surface \(\Delta^{14}C\) (green circles) compared to 2cm depth \(\Delta^{14}C\) (red triangles) in sites where the soil surface was dated post-bomb peak (post 1966). Dashed lines represent young sites. Solid lines represent old sites.

Combustion of Legacy C

To assess if legacy C was combusted during the burning event the \(\Delta^{14}C\) values of the soil surface samples was compared to atmospheric \(\Delta^{14}CO_2\) values at the time of stand establishment (in plots where Legacy C was considered present). Legacy C was considered combusted if the soil surface was older than the stand age.

Therefore legacy C was combusted in the plot IF:

  1. Soil surface C levels were lower than atmospheric stand age \(CO_2\) levels in sites where the stand age was pre bomb peak AND the soil surface was pre bomb peak. (natural state, so if the soil surface was older, it would have less C than atmosphere at the time of site establishment)

  2. The stand age was dated post bomb peak AND the soil surface was dated pre bomb peak

  3. The soil surface C levels were higher than atmospheric stand age \(CO_2\) levels in sites where the stand age was post bomb peak AND the soil surface was post bomb peak.

Note: legacy C was not combusted if the stand age was pre bomb peak and the surface was post bomb peak.

First we can extract the soil surface \(\Delta^{14}C\) values.

top.surface.layer<- surface.layer[surface.layer$soil_depth_increment == 1,]
top.surface.layer <- top.surface.layer[duplicated(top.surface.layer$burned_site_plot) == FALSE,] 

Then extract plots where Legacy C was present.

top.surface.layer$LP <- basal.layer$LC
legacy.plots <- top.surface.layer[top.surface.layer$LP == 'Present',]

Then we must identify and label combustion status for each plot.

legacy.plots$LC <- NULL 
for (i in 1:nrow(legacy.plots)) {
  if (legacy.plots$stand_age_pre.post_bomb[i] == 'pre' & legacy.plots$soil_class_pre.post_bomb[i] == 'pre'){
    if (legacy.plots$Delta14C[i] < legacy.plots$X14C_yr_stand_age[i]){
      legacy.plots$LC[i] <- 'Combusted'
    }
    else {
      legacy.plots$LC[i] <- 'Not.Combusted'
    }
  }
  else if (legacy.plots$stand_age_pre.post_bomb[i] == 'post' & legacy.plots$soil_class_pre.post_bomb[i] == 'pre') {
      legacy.plots$LC[i] <- 'Combusted'
    }
  else if (legacy.plots$stand_age_pre.post_bomb[i] == 'pre' & legacy.plots$soil_class_pre.post_bomb[i] == 'post') {
      legacy.plots$LC[i] <- 'Not.Combusted'
    }
  else if (legacy.plots$stand_age_pre.post_bomb[i] == 'post' & legacy.plots$soil_class_pre.post_bomb[i] == 'post') {
       if (legacy.plots$Delta14C[i] < legacy.plots$X14C_yr_stand_age[i]){
      legacy.plots$LC[i] <- 'Not.Combusted'
    }
    else {
      legacy.plots$LC[i] <- 'Combusted'
    }
  }
}

Data Visualization

Manipulate the data for the ggplot object

surface.carbon <- legacy.plots[,-17]
surface.carbon$Location <- 'Soil.Surface'
colnames(surface.carbon)[12] <- 'Delta14C'
surface.stand.age <- legacy.plots[,-12]
surface.stand.age$Location <- 'Stand.Age'
colnames(surface.stand.age)[16] <- 'Delta14C'

legacy.combustion <- rbind(surface.carbon, surface.stand.age)

Plot the soil surface C levels against the atmospheric \(CO_2\) at the time of stand establishment for all of the sites

ggplot(legacy.combustion, aes(x= Location, y = Delta14C)) + geom_point(aes(shape = Location, color = Location), size = 3) + geom_line(aes(group = site_sample_transect, linetype = LC)) + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = NULL, y = 'Delta 14 Carbon (parts per thousand)', linetype = 'Legacy Carbon') + scale_linetype_manual(values = c('dashed', 'solid'))

Here we can’t see any clear relationship between the \(\Delta^{14}C\) levels of each location. It would be more beneficial to subset the sites into their soil and stand age relative to the bomb peak to really see patterns in how legacy C combustion was determined.

So first we can subset the legacy plot data set

legacy.combustion.pre.pre <- legacy.combustion[legacy.combustion$stand_age_pre.post_bomb == 'pre' & legacy.combustion$soil_class_pre.post_bomb == 'pre',]
legacy.combustion.post.pre <- legacy.combustion[legacy.combustion$stand_age_pre.post_bomb == 'post' & legacy.combustion$soil_class_pre.post_bomb == 'pre',]
legacy.combustion.pre.post <- legacy.combustion[legacy.combustion$stand_age_pre.post_bomb == 'pre' & legacy.combustion$soil_class_pre.post_bomb == 'post',]
legacy.combustion.post.post <- legacy.combustion[legacy.combustion$stand_age_pre.post_bomb == 'post' & legacy.combustion$soil_class_pre.post_bomb == 'post',]

Then plot each set of site types separately (Figure 2 from original paper)

ggplot(legacy.combustion.pre.pre, aes(x= Location, y = Delta14C)) + geom_point(aes(shape = Location, color = Location), size = 3) + geom_line(aes(group = site_sample_transect, linetype = LC)) + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = NULL, y = 'Delta 14 Carbon (parts per thousand)', linetype = 'Legacy Carbon') + scale_linetype_manual(values = c('dashed', 'solid'))

Caption: Surface \(\Delta^{14}C\) (green circles) compared to atmospheric \(\Delta^{14}CO_2\) (red triangles) at the time of stand establishment in sites that originally contained legacy C, the stand was dated pre-bomb peak and the soil surface was dated pre-bomb peak. Dashed lines represent sites where Legacy C was combusted. Solid lines represent sites where Legacy C was not combusted.

ggplot(legacy.combustion.post.pre, aes(x= Location, y = Delta14C)) + geom_point(aes(shape = Location, color = Location), size = 3) + geom_line(aes(group = site_sample_transect, linetype = LC)) + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = NULL, y = 'Delta 14 Carbon (parts per thousand)', linetype = 'Legacy Carbon') + scale_linetype_manual(values = c('dashed', 'solid'))

Caption: Surface \(\Delta^{14}C\) (green circles) compared to atmospheric \(\Delta^{14}CO_2\) (red triangles) at the time of stand establishment in sites that originally contained legacy C, the stand was dated post-bomb peak and the soil surface was dated pre-bomb peak. Dashed lines represent sites where Legacy C was combusted. Solid lines represent sites where Legacy C was not combusted.

ggplot(legacy.combustion.pre.post, aes(x= Location, y = Delta14C)) + geom_point(aes(shape = Location, color = Location), size = 3) + geom_line(aes(group = site_sample_transect, linetype = LC)) + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = NULL, y = 'Delta 14 Carbon (parts per thousand)', linetype = 'Legacy Carbon') + scale_linetype_manual(values = c('dashed', 'solid'))

Caption: Surface \(\Delta^{14}C\) (green circles) compared to atmospheric \(\Delta^{14}CO_2\) (red triangles) at the time of stand establishment in sites that originally contained legacy C, the stand was dated pre-bomb peak and the soil surface was dated post-bomb peak. Dashed lines represent sites where Legacy C was combusted. Solid lines represent sites where Legacy C was not combusted.

ggplot(legacy.combustion.post.post, aes(x= Location, y = Delta14C)) + geom_point(aes(shape = Location, color = Location), size = 3) + geom_line(aes(group = site_sample_transect, linetype = LC)) + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = NULL, y = 'Delta 14 Carbon (parts per thousand)', linetype = 'Legacy Carbon') 

Caption: Surface \(\Delta^{14}C\) (green circles) compared to atmospheric \(\Delta^{14}CO_2\) (red triangles) at the time of stand establishment in sites that originally contained legacy C, the stand was dated post-bomb peak and the soil surface was dated post-bomb peak. Dashed lines represent sites where Legacy C was combusted. Solid lines represent sites where Legacy C was not combusted.

Similar to when we were assessing legacy C presence or absence, we can look at the combustion status relationship with our expected predictor variables.

First we must extract the appropriate plots (where Legacy C was present)

combustion.r.data <- log.r.data[log.r.data$LC == 'Present',]
combustion.r.data$combustion <- legacy.plots$LC
ggplot(data = combustion.r.data, aes(x = combustion, y = prefire_sol_depth)) + geom_boxplot(aes(color = combustion)) + theme_bw() + labs(title = 'Combustion v. Prefire SOL depth ', x = 'Legacy Carbon', y = 'Prefire SOL depth (cm)') + scale_color_manual(values = c('darkgreen', 'darkred')) + theme(legend.position="none")

Caption: Legacy C (combusted = green and not combusted = red) plotted against prefire SOL depth. The boxplot displays the median. The two hinges represent the first and third quartiles (25th and 75th percentiles). The lower and upper whiskers extend from the smallest and largest values respectively to no further than 1.5 * IQR (inter-quartile range).

ggplot(data = combustion.r.data, aes(x = combustion, y = stand_age_rings)) + geom_boxplot(aes(color = combustion)) + theme_bw() + labs(title = 'Combustion v. Stand Age ', x = 'Legacy Carbon', y = 'Stand Age (yr)') + scale_color_manual(values = c('darkgreen', 'darkred')) + theme(legend.position="none") + geom_hline(aes(yintercept = 60), linetype = 'dashed')

Caption: Legacy C (combusted = green and not combusted = red) plotted against stand age. The boxplot displays the median. The two hinges represent the first and third quartiles (25th and 75th percentiles). The lower and upper whiskers extend from the smallest and largest values respectively to no further than 1.5 * IQR (inter-quartile range). The horizontal dashed line at 60 years represents the cutoff for stand age class (young/old)

From both of these plots we can see no evidence for combustion in old sites, and drier and younger sites combusted more frequently.

Multiple Logistic Regression

To test our hypotheses for Legacy C combustion, we again must use a generalized linear model using the log link function as our response variable of combustion status is also binary. In this case our model will look at the predictor variables of stand age and the proportion of the soil organic layer burned (as a proxy for fire severity)

So first we must factor our response variable and calculate the proportion of the soil organic layer burned from the burn depth and the prefire sol depth.

combustion.r.data$combustion <- factor(combustion.r.data$combustion, levels = c('Not.Combusted', 'Combusted'))
levels(combustion.r.data$combustion) #important to factor in the correct order for glm model interpretation
## [1] "Not.Combusted" "Combusted"
combustion.r.data$proportion_sol_burned <- combustion.r.data$burn_depth/combustion.r.data$prefire_sol_depth

Then we can move onto model selection

Model Selection

To evaluate the model significance we can compare model fit between a full, a reduced and a null model. In our case the full model will include the interaction terms between the predictor variables stand age and SOL depth. The reduced model will remove only the interaction term between the two predictor variables and the null model will be an intercept only model.

full.combustion <- glm(data = combustion.r.data, formula = combustion ~ proportion_sol_burned * stand_age_rings, family = binomial)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(full.combustion) 
## 
## Call:
## glm(formula = combustion ~ proportion_sol_burned * stand_age_rings, 
##     family = binomial, data = combustion.r.data)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.26155  -0.07383   0.00000   0.01511   1.67247  
## 
## Coefficients:
##                                       Estimate Std. Error z value Pr(>|z|)
## (Intercept)                            20.0540    16.9873   1.181    0.238
## proportion_sol_burned                 -13.5919    19.1339  -0.710    0.477
## stand_age_rings                        -0.6923     0.5185  -1.335    0.182
## proportion_sol_burned:stand_age_rings   0.7773     0.6005   1.294    0.196
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 21.9007  on 18  degrees of freedom
## Residual deviance:  7.5533  on 15  degrees of freedom
## AIC: 15.553
## 
## Number of Fisher Scoring iterations: 10
reduced.combustion <- glm(data = combustion.r.data, formula = combustion ~ proportion_sol_burned + stand_age_rings, family = binomial)
summary(reduced.combustion) 
## 
## Call:
## glm(formula = combustion ~ proportion_sol_burned + stand_age_rings, 
##     family = binomial, data = combustion.r.data)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.39200  -0.48671  -0.03575   0.11308   1.79539  
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)  
## (Intercept)           -3.78147    2.99380  -1.263   0.2066  
## proportion_sol_burned 12.52782    7.38223   1.697   0.0897 .
## stand_age_rings       -0.05758    0.03494  -1.648   0.0993 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 21.901  on 18  degrees of freedom
## Residual deviance: 10.708  on 16  degrees of freedom
## AIC: 16.708
## 
## Number of Fisher Scoring iterations: 7
null.combustion <- glm(data = combustion.r.data, formula = combustion ~ 1, family = binomial)
summary(null.combustion) 
## 
## Call:
## glm(formula = combustion ~ 1, family = binomial, data = combustion.r.data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7815  -0.7815  -0.7815   0.4263   1.6340  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)   -1.030      0.521  -1.976   0.0481 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 21.901  on 18  degrees of freedom
## Residual deviance: 21.901  on 18  degrees of freedom
## AIC: 23.901
## 
## Number of Fisher Scoring iterations: 4

We can compare the models using a log likelihood test which utilizes a ratio of the log-likelihoods as the test statistic and utilizes the Chi-squared distribution to calculate the p-values. Therefore we can run a log-likelihood test by arguing the two models to the anova() function and additionally including test = 'Chisq' argument.

anova(reduced.combustion, full.combustion, test = 'Chisq') 
## Analysis of Deviance Table
## 
## Model 1: combustion ~ proportion_sol_burned + stand_age_rings
## Model 2: combustion ~ proportion_sol_burned * stand_age_rings
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
## 1        16    10.7085                       
## 2        15     7.5533  1   3.1552  0.07569 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here the chi-squared p value is not significant so the inclusion of the interaction term does not significantly decrease deviance and we can resume with the reduced model.

We can also run a log -likelihood test using the lrtest() from the {lmtest} package

lrtest(reduced.combustion, full.combustion) 
## Likelihood ratio test
## 
## Model 1: combustion ~ proportion_sol_burned + stand_age_rings
## Model 2: combustion ~ proportion_sol_burned * stand_age_rings
##   #Df  LogLik Df  Chisq Pr(>Chisq)  
## 1   3 -5.3542                       
## 2   4 -3.7767  1 3.1552    0.07569 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here the higher log likelihood value indicates a greater model fit, but again the chi-squared p value is not significant so even though the full model has a ‘better’ fit it is not significantly better. Therefore, again we can proceed with utilizing the reduced model for our analysis.

Next, we must compare the accepted model, the reduced model, against the null model

anova(null.combustion, reduced.combustion, test = 'Chisq') 
## Analysis of Deviance Table
## 
## Model 1: combustion ~ 1
## Model 2: combustion ~ proportion_sol_burned + stand_age_rings
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)   
## 1        18     21.901                        
## 2        16     10.709  2   11.192 0.003712 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here the chi-squared p value is less than alpha (.05) meaning the fuller model is associated with less residual deviance and therefore is a better fit! We can reject the null hypothesis that the removal of the predictor variables does not result in a loss of fit and proceed with utilizing the reduced model for our analysis.

Interpretation

Our slope estimate (\(\beta1\)) now represents the associated change in the response variable (which if we remember is the natural log of the odds ratio) with an increase in the predictor variable of 1 unit.

The coefficient/estimate for the proportion of the soil organic layer was positive. Therefore increasing proportions of the SOL burned results in an increased likelihood of legacy C being combusted. For every one unit of proportion of SOL burned, the probability of legacy C being combusted increases by 12.52782. Where the estimate for stand age is negative indicating every year the stand ages, the likelihood of legacy C being combusted decreases by 0.05758. Even though neither of these relationships showed a significant p-value.

In order to get the actual odds ratio we must back transform the estimates using the reverse of the link function.

prop.sol.burned.change <- rev.logit(reduced.combustion$coefficients[2]) 
stand.age.change <- rev.logit(reduced.combustion$coefficients[3])

prop.sol.burned.change
## proportion_sol_burned 
##             0.9999964
stand.age.change
## stand_age_rings 
##       0.4856079

1 unit increase in SOL depth results in a 99% increase in the likelihood of Legacy C combustion 1 unit increase in stand age results in a 49% decrease in the likelihood of Legacy C combustion

Hypothesis Testing

To test our null hypothesis that the response variable (Legacy C combustion) and our predictor variables have no relationship, we can calculate the Wald statistic (similar to a t-statistic) for each predictor variable and compare them to the z distribution.

Calculating the Wald Statistic for proportion of SOL burned

modelresults <- tidy(reduced.combustion)
wald <- modelresults$estimate[2]/modelresults$std.error[2]
p <- 2 * (1 - pnorm(wald))  # calculation of 2 tailed p value associated with the Wald statistic
p
## [1] 0.08969244

for stand age

modelresults <- tidy(reduced.combustion)
wald <- modelresults$estimate[3]/modelresults$std.error[3]
p <- 2 * (1 - pnorm(wald))  # calculation of 2 tailed p value associated with the Wald statistic
p
## [1] 1.900684

Again neither predictor variable seems to show a significant p-value.

Confidence Intervals

Finally, we can calculate the confidence intervals around our estimates

CI <- confint.default(reduced.combustion, level = 0.95) # this function returns CIs based on standard errors instead of based on log-likelihood 
CI
##                            2.5 %      97.5 %
## (Intercept)           -9.6492143  2.08628045
## proportion_sol_burned -1.9410968 26.99672747
## stand_age_rings       -0.1260621  0.01089314

Calculating the CIs for the actual odds ratios

prop.sol.burned.changeCI <- rev.logit(CI[2, ]) #for prefire SOL depth
stand.age.changeCI <- rev.logit(CI[3, ]) #for stand age 

prop.sol.burned.changeCI
##     2.5 %    97.5 % 
## 0.1255274 1.0000000
stand.age.changeCI
##     2.5 %    97.5 % 
## 0.4685261 0.5027233

Model Predictions

Transforming the factored combustion variable into a binary value for the ggplot object

combustion.r.data$combust.bin <- NULL 
for (i in 1:nrow(combustion.r.data)) {
  if (combustion.r.data$combustion[i] == 'Combusted') {
    combustion.r.data$combust.bin[i] <- 1
  }
  else {
    combustion.r.data$combust.bin[i] <- 0
  }
}

To visualize the predicted likelihood of legacy C combustion by each separate predictor variable we must first generate a new model for each variable.

glm.sol.burned <- glm(data= combustion.r.data, formula = combustion ~ proportion_sol_burned, family = 'binomial')
summary(glm.sol.burned)
## 
## Call:
## glm(formula = combustion ~ proportion_sol_burned, family = "binomial", 
##     data = combustion.r.data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4942  -0.6229  -0.3927   0.3403   2.0983  
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)  
## (Intercept)             -4.327      2.059  -2.101   0.0356 *
## proportion_sol_burned    6.293      3.454   1.822   0.0685 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 21.901  on 18  degrees of freedom
## Residual deviance: 17.498  on 17  degrees of freedom
## AIC: 21.498
## 
## Number of Fisher Scoring iterations: 5
glm.stand.age <- glm(data = combustion.r.data, formula = combustion ~ stand_age_rings, family = 'binomial')
summary(glm.stand.age)
## 
## Call:
## glm(formula = combustion ~ stand_age_rings, family = "binomial", 
##     data = combustion.r.data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1644  -0.8727  -0.3323   0.4042   2.2285  
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)
## (Intercept)      1.23090    1.30256   0.945    0.345
## stand_age_rings -0.03154    0.01979  -1.594    0.111
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 21.901  on 18  degrees of freedom
## Residual deviance: 17.318  on 17  degrees of freedom
## AIC: 21.318
## 
## Number of Fisher Scoring iterations: 6

Then generate a new range of values of our predictor variables for which we will predict LC combustion.

new.sol.burned <- as.data.frame(seq(min(combustion.r.data$proportion_sol_burned),max(combustion.r.data$proportion_sol_burned), length.out = 32))
names(new.sol.burned) <- 'proportion_sol_burned' 
new.stand.age <- as.data.frame(seq(min(combustion.r.data$stand_age_rings),max(combustion.r.data$stand_age_rings), length.out = 32))
names(new.stand.age) <- 'stand_age_rings'

Predict the Combustion status for the generated proportion burned values

prediction.sol.burned <- cbind(new.sol.burned,predict(glm.sol.burned, newdata = new.sol.burned, type = 'link', se = TRUE))

Generate the upper and lower confidence intervals for each prediction

prediction.sol.burned$LL <- rev.logit(prediction.sol.burned$fit - 1.96 * prediction.sol.burned$se.fit)
prediction.sol.burned$UL <- rev.logit(prediction.sol.burned$fit + 1.96 * prediction.sol.burned$se.fit)
head(prediction.sol.burned)
##   proportion_sol_burned       fit   se.fit residual.scale          LL        UL
## 1             0.1799147 -3.194867 1.474844              1 0.002270320 0.4245377
## 2             0.1999821 -3.068592 1.411628              1 0.002913786 0.4251174
## 3             0.2200494 -2.942317 1.349013              1 0.003734564 0.4259849
## 4             0.2401168 -2.816042 1.287085              1 0.004779021 0.4271827
## 5             0.2601841 -2.689767 1.225949              1 0.006104368 0.4287614
## 6             0.2802515 -2.563492 1.165730              1 0.007780511 0.4307819

Plot the predicted values and their 95% CI alongside the actual data (Figure 3 from original paper)

p <- ggplot(data = prediction.sol.burned, aes(x= proportion_sol_burned, y = rev.logit(fit))) + geom_line() + geom_ribbon(data = prediction.sol.burned, aes(ymin = LL, ymax = UL), alpha = .2) + labs(x = "Proportion SOL burned", y ="Pr(Combustion)", color = 'Moisture Class', shape = 'Stand Age Class') + geom_point(data = combustion.r.data, aes(x = proportion_sol_burned, y = combust.bin, color = moisture_class, shape = stand_age_class), size = 3.5) + theme_bw() + scale_color_manual(values = c('darkgreen', 'darkred','darkblue')) + ylim(0,1)
p

Caption: Predicted probability of Legacy C combustion for the proportion of SOL burned. Actual data points are grouped by stand age class and moisture class. The ribbons represent the 95% confidence interval for the predicted values.

Then we must do the same for the stand age predictor.

We can predict the combustion status for the generated stand age values.

prediction.stand.age <- cbind(new.stand.age ,predict(glm.stand.age, newdata = new.stand.age, type = 'link', se = TRUE))

and generate the upper and lower confidence intervals for each prediction.

prediction.stand.age$LL <- rev.logit(prediction.stand.age$fit - 1.96 * prediction.stand.age$se.fit)
prediction.stand.age$UL <- rev.logit(prediction.stand.age$fit + 1.96 * prediction.stand.age$se.fit)
head(prediction.stand.age)
##   stand_age_rings         fit    se.fit residual.scale        LL        UL
## 1        19.00000  0.63168865 0.9816617              1 0.2154522 0.9279586
## 2        24.77419  0.44958614 0.8927017              1 0.2141467 0.9001850
## 3        30.54839  0.26748363 0.8101150              1 0.2107615 0.8647492
## 4        36.32258  0.08538112 0.7360500              1 0.2046832 0.8217192
## 5        42.09677 -0.09672139 0.6733248              1 0.1952206 0.7725902
## 6        47.87097 -0.27882389 0.6253610              1 0.1817506 0.7204880

Plot the predicted values and their 95% CI alongside the actual data (Figure 3 from original paper)

p <- ggplot(data = prediction.stand.age, aes(x= stand_age_rings, y = rev.logit(fit))) + geom_line() + geom_ribbon(data = prediction.stand.age, aes(ymin = LL, ymax = UL), alpha = .2) + labs(x = "Stand Age (yr)", y ="Pr(Combustion)", color = 'Moisture Class', shape = 'Stand Age Class') + geom_point(data = combustion.r.data, aes(x = stand_age_rings, y = combust.bin, color = moisture_class, shape = stand_age_class), size = 3.5) + theme_bw() + scale_color_manual(values = c('darkgreen', 'darkred','darkblue')) + ylim(0,1) + geom_vline(aes(xintercept = 60), linetype = 'dashed')
p

Caption: Predicted probability of Legacy C combustion for stand age. Actual data points are grouped by stand age class and moisture class. The ribbons represent the 95% confidence interval for the predicted values.

Conclusion

So overall I was able to reproduce general trends similar to the original data analysis. As stand age increased, the likelihood of legacy C presence and combustion both decreased. Legacy C combustion was more likely as the proportion of the SOL burned and the presence of legacy C became more likely as the prefire SOL depth increased. These trends indicate that more frequent fire intervals and increased fire severity under climate warming are threats to boreal forests ability to act as C sinks and counteract some of the C emissions contributing to greenhouse gas formation.

Where I strayed was in the statistical significance of my generalized linear model results. The original authors were able to obtain significant results for all of their analyses. They did indicate that they used Firth’s logistic regression using the {logistf} package instead of the standard glm() function in the {lmtest} package to account for highly predictive covariates that are common in generalized linear models using smaller sample sizes. Yet when I tried running the analysis under that package the p-values seemed to be even less significant.